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A Method of Gradients for the Calculation of the 
Characteristic Roots and Vectors of a Real Symmetric 
Matrix ' 

Magnus R. Hestenes and William Karush 

Let A be a real symmetric ina,tTix/ip(x) ^ The Rayleigh quotient formed with a vector x, 
and ^(x) the gradient vector of mW- The method of gradients consists in an infinite iteration 
of the operation x=x—a(x)^{x). The convergence of the procedure is proved for several 
choices for cx{x), and the rate of convergence is studied extensively for one particular a{x). 
The directions of the vectors in the sequence are seen to tend to that of the characteristic 
vector belonging to the lowest characteristic value. The method can be used for a numerical 
determination of all characteristic vectors and values. 



I. Introduction 
With a real symmetric matrix 



is associated the Rayleigh quotient 



{x, x) 



whose critical points \j are the characteristic vectors 
of A. The gradient of \x has tlie direction of ^ 



^{x) = Ax — ikX. 



(1) 



Suppose now, given a vector x,^ we wish to modify x to 
obtain a better approximation x to a characteristic 
vector ^min corresponding to the minimum character- 
istic root Xmin=niin \x(x). It is natural to form 



x^=x — a^, 



a>0, 



(2) 



where a. may depend upon x. Similarly, to approxi- 
mate to a characteristic vector ^max corresponding 
to Xmax=niax jLt(a:) we form 

x = x-\-a^^ a>0. 

In the present paper we describe several convergent 
iterative methods based upon this gradient process, 
and investigate the convergence to the characteristic 
roots and vectors of A. The results apply to an 
arbitrary, real symmetric matrix. The methods can 
be phrased to yield directly ?/min or ^max, as one 
wishes. For convenience we direct our attention 
mainly to ?/min. 

In prescribing a gradient method one must specify 
how the number a. is to be chosen at each stage x* of 
the iteration. It is shown that the vectors x'^ con- 
verge to the appropriate characteristic vector if a is 



1 The preparation of this paper was sponsored (in part) by the Office of Naval 
Research. 

2 For convenience we shall refer to ^ as the gradient. 



any positive constant (indep(*nd(vnt of %) less tlian 
2/M, A/=Xniax— X,nin- Tlic biilk of the theory is con- 
cerned with this '^method of fixed a." If we impose 
the stricter requirement a<^l/A/, we obtain in addi- 
tion that the gradients ^' converge in direction (i. e., 
the unit vectors ^71 Tl converge) to a charac^teristic 
vector; in fact the method will be generalized to yield 
all of the characteristic vectors of A. As would be 
expected, the nature of tlu* convergence is essentially 
geometric. 

A second ''method of optimum a" is treated in 
which a, which now depends upon j, is selected in a 
certain ''best'' way. In this method the approxima- 
tions x^ converge to a characteristic vector-, but the 
gradients J^ fail to converge in direction.^ 

The well-known method "^ of forming powers AH 
can be interpreted as a gradient method in which a is 
chosen as — l//x. Here convtM-gence, in geiu^ral, is to a 
characteristic vector corresponding to the character- 
istic root of maximum absolute value. We remark 
that, commonly, the gradients ^^ converge in direction 
to a characteristic vector corresponding to a root of 
next highest absolute value. 

The chief virtue of the gradient methods seems to 
lie in their simplicity. They are not put forth as 
rapid procedures for a hand computer, but rather as 
processes that might be adapted to automatic com- 
puting machinery. A survey of methods for calcu- 
lating characteristic roots and vectors of (more 
general) matrices may be found in Hammersley.^ 

II. Properties of Symmetric Matrices 

In this section we collect for reference some well- 
known facts on symmetric matrices. 



3 An extension of the method of optimum a to more general problems has been 
outlined by L. V. Kantorovitch, On an effective method of solving extremal 
problems for quadratic functionals, C. R. (Doklady) Acad. Sci., URSS (N. S.), 
48, 455-460 (1945) . These results are closely related to some unpublished work of 
M. R. Hestenes. 

< See H. Hotelling, Analysis of a complex of statistical variables into principal 
components, J. Educ. Psych. 24, 417-441 and 498-520 (1933). In this paper, 
Hotelling treats the symmetric matrix. For the extension to nonsymmetrie 
matrices, see A. C. Aitkin, Studies in practical mathematics II. The evaluation 
of the latent roots and latent vectors of a matrix, Proc. Roy. Soc. Edinburgh [A] 
57. 269-304 (1937). 

5 The numerical reduction of non-singular matrix pencils, Phil. Mag. 40, 783-807 
(1949). 
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Consider, for the moment, the space of complex 
vectors x=(biy 62, • • ■, f>n) over the field of complex 
numbers. A number A is called a characteristic root 
(number, value) of an arbitrary matrix B in case 
there exists a nonnull vector x such that 

Bx=Ax. 

The vector x is called a chara cteristic vector ; we shall 
say that it belongs to the characteristic root A. 

For a real symmetric matrix A the characteristic 
roots are real and the characteristic vectors can be 
chosen to be real. Accordingly we henceforth limit 
ourselves to the field of leal numbers and to real 
vectors. We let .^denote the space of real vectors 
X. We use ^^ (x, yY^ for inner product, and^^|x|^' for 
length in this space. We note a fundamental 
property of A\ 



{Ax,y) = {x,Ay) 



(3) 



Let the distinct characteristic roots of A be de- 
noted by 

Ai<A2< . . . <Ap, p>2, 

(eliminating the trivial case p^l). With each Aa; is 
associated the linear subspace . ^^ of ,c>i^ which is the 
set of all characteristic vectors belonging to A;^, 
together with the null vector. The dimension of 
^/k is the order of A^fc. The subspaces ^^^ are 
mutually orthogonal and span the space ,^ 

To each nonnull vector x we attach the number 



iu(x) = 



{Ax, x) 



(x,x) 
The function /x is homogeneous of degree zero; 

fiibx) = fx(x), 6f^0. 

For a characteristic vector y belonging to a charac- 
teristic root A, 

lM{y)=A, 

We have the following well-known relations between 
ju and the characteristic roots. 

Ai=min /i(a;), Ap=max /x(x), X9^0 

X X 

or, equivalently, 

Ai = min /x(x), Ap=max ju(x), t^| = l- 

X X 

More generally, 

Afc:=min/i(x), x?^0 and orthogonal to. #1, . . ., ^_i; 

X 

Afc=max)Li(a;), a; ?^0 and orthogonal to ^+1, . . ., ^. 

X 

To study the behavior of subspaces of ,c/ under 
the matrix A it is convenient to think of ^ as a 



linear operator, without regard for coordinate repre- 
sentation. In general, if 5 is a symmetric linear 
operator on a finite dimensional space .^, then the 
statements of the preceding three paragraphs hold 
(by ^^ symmetry" is meant the property of (3)). 
Now suppose that [^ is an m-dimensional subspace 
of j3/ and let B be the operator A with domain 
restricted to ^. If ^ is invariant under A, i. e., if 
X belonging to ^ implies Ax belongs to ^, then B 
is a symmetric linear operator on .^. As such, B 
has m characteristic roots (counting multiplicities), 
and the roots and vectors of B are roots and vectors 
of^. 

The characteristic roots of the matrix A are the 
solutions of the polynomial equation 

|A/-^| = 0. 

The multiplicity of a solution A^: is precisely the dimen- 
sion of the subspace ^/^, i. e., the order of A^^. We 
remark that if A is regarded as a linear operator, 
then this equation, formed with any matrix represen- 
tation of A, yields the characteristic roots. 

In our work we shall be dealing always with a 
fixed initial vector x^. Let x^ have nonnull projec- 
tions in . /^^, . . . , ^jc^ and only in these subspaces 
^£n. We may represent x^ uniquely in the form 



«?>o,, \yi\ = \, 2/,-6^., i=i,2. 



(4) 



,r>2. 



(We exclude the trivial case r=\.) Let s^/r be the r 
dimensional subspace spanned by 1/1, . . . , ?/r; 

j^=(yi, y2, ' ' ' , yr)^ 

Then s^r is an invariant subspace under A with 
orthonormal basis (?/i, 2/2, ... ,yr). The character- 
istic roots of A relative to this invariant subspace 
(i. e. the roots of the linear operator A with domain 
restricted to j^) are 



where 



Xi<X2< • • •<X. 



Xy — Aa;^., 



In the subspace each root Xy has order one, and has 
as corresponding characteristic vector, yj. In the 
following pages, after having selected an initial 
vector x^, and thus determined the invariant sub- 
space j3^, we shall be dealing exclusively with vectors 
in this subspace. 

III. Determination of a 

This section is discursive; the theorems of the 

following sections do not logically depend upon it. 

A direct calculation shows that the gradient of ju is 

[Ax—ix{x)x]i 



{x,x) 
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and hence has the cUrection of ^, given b}^ eq (1). 
As remarked, we shall refer to ^ as the gradient. 
Observe the relations 



{x,0 = 0, {Ax,0 = (^,0- 



(5) 



The gradient is the direction in which /x locally 
increases most rapidly. Thus if we form x from x 
as in eq (2), with a>0 sufficiently small, we should 
expect jLi(x) to approximate Xi more closely than 
/x(x). Beginning now with x and, say, a, we obtain 
from eq (2) a next approximation x, etc. The 
question then is to specify a systematically. 

By direct computation we find 



where 



^{x)-^(^)= '^^^J^ r' 






(6) 



That is, 

H{x)-,i(l)=f{a)t', J{a)^:f{a,Oc)-- 



a{ 2-as) 



(7) 



A natural requirement on a{x) in order to expect 
convergence of /x to Xi is then 

/(a(.r))>0. 

A possible choice is a = const, with 



where 



0<«<M' 



A/ = Ap-Ai 



(8) 



is the spread of the characteristic roots of A, Since 
s<M it is easily seen that /(q:)>0. The method 
of fixed a, stemming from this observation, is treated 
in sections 5 and 6 and generalized in sections 10 
and 11. 

Another possibility for a is that number 7=7(j^) 
which maximizes /(a). From/'(7) = we obtain 



from which 



tS'+sy-l=0, 



>o. 



(9) 



(10) 



Computation shows that /(y) =7. Hence 

^x(x)-iJi(x) = yt\ x = x — y^. 

The method of optimum a, based on this approach, is 
treated in section 7. We note the following formula: 



1 



"mCQ-m© 



(11) 



To verify this we write eq (10) as y(yf-{-s) = lj 
and note that the quantity in pai'entheses equals 

m(?)-m(.^). 

Remarks on the power method, i. e., forming suc- 
cessive iterates A^x, will be made in section 8. In 
that section and in the later section 12 modifications 
of the methods of fixed and optimum a will be 
suggested. 

Before procee(Ung to any spcHMfic gradicMit method 
we formulat(» some genei'al results in the next section. 

IV. General Gradient Method 

We suppose given a real-valued function 7 such 
that y(x) is defined whenever X9^0 and x is not a 
characteristic vector, i. e., whenever X9^0 and ^t^O. 
We require that 

/(7(^))>0. 

(See, e. g., the particular function 7 detei-mined by 
eq (10). Beginning with a vector x^, expressed in the 
form (4), we construct the sequence of vectors 
{x'}, i=0, 1,2,.. ., according to 

x'^^ = x' — y'^, where ^' = ^ix'), y' = y(x'). (12) 

This is equivah^it to 

x' + ' = {l +y'ii')x'-y'Ax\ where ijl' = ^JL(x'). (13) 

We must assure ourselves that {/'} is well d(^fined. 
We note first that if a;VO and ^'^0, then x'+' is 
defined and not null. For, uncUu* the hypotheses, 
x'^^ is defined and by eq (12), 



v.i + l|2_ 



ixT+ine 



We thereby distinguish two types of sequences. 
The first, a trivial case, is one such that for some 
first integer k, 

We terminate the induction at i = k', the sequence 
{x'}, i — 0, 1, • • •, k, is finite with the last vector x^ 
a characteristic vector. The second type is charac- 
terized as a sequence that is not of the first type. 
In this case {x^} and {J'} are well-defined infinite 
sequences and 



e^o, 



i = 0, 1, 2, 



To enable us to formulate statements that hold 
simultaneously for finite and infinite sequences we 
make the following agreements: (1) if {p'} is an 
arbitrary finite sequence, then lim p"- denotes the 

last member of the sequence; (2) any statement or 
formula involving an index is to be limited to 
meaningful values of the index. 
From eq (7), 



M^-M^+^=/(tO^* >0, P = 



n 



(14) 
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Since /x' is bounded from below, it follows that 
there is a number v such that 



lim ix^=v. 



(15) 



The vectors x^ and ^' lie within the invariant sub- 
space s^r — {y\j y2, ' ' ' , Vr)] their expansions in terms 
of the characteristic vectors ijj are readily deter- 
mined from eq (1) and (12). We find 



ai={i+y-HM^-^-Xi)}c^r^ 

and 



Thus 



?'=(Xl— M'>i2/i + - • •+(Xr— M0«'t2/r. 



\x'Y=a^^ + ai,^^ VK^, 



irr=(Xi-MT<-f. . .+(x.-MO'«r 

Theorem 4.1. Suppose 7^ is such that 
0<Ki</(70 and 0<7N 
for some constant Ki. Then 



(16) 

(17) 
(18) 



lim _^ 



lim 



=]/i and|^i^M^ = Xi. 



We limit ourselves to the case when the sequence 
{x^] is infinite; by appropriate simplification the 
proof below applies as well to the finite sequence. 

From eq (14) 

Since 11^-^ v, it follows that 



(19) 



Let 

Then by eq (18) 






.,2-S (X,-mO^&A 7^bf=l, (20) 

X I ;=i j=i 



Consequently 



11'^ (X.-M0^6f =0, j=l,2,-..,r. 

Since (Xy— /x')-^(Xj — v) it follows from the last equa- 
tion of (20) that for some specific value / of the 
index j, 

ii>'-^\h ^;'->l, ^-^^ for ;VZ. 



We wish to show that ^ = 1 . Suppose Z> 1 . Then 

h\ I ail 



h\ \a\ 



>0. 



From the monotonicity of /x% we have m^>Xz>Xi. 
From eq (16) 



\aV 



|i+y(M^-xoi ^ii^^ 
|i+7^(m^-xo 



(i^i\-'H\ 



Hence 



^>g{>0fori = 0,l,2, 



This contradiction shows that /= 1. We now have 



^^■-^Xi, T^-^1, 



0^1 



\x'\ 



>0, JF^l, 



smce 



ai>a?>0. 



The theorem now follows from the first [equation of 
(16). 

Theorem 4.2. If, in addition to the hypotheses 
of theorem 4.1, the sequence 7' satisfies the condi- 
tion 

7'<K2, ^2= const., 

then the conclusion of that theorem may be strength- 
ened to 

lim x^^Lyi, 

where i is a positive constant. 

By theorem 4.1 it is sufficient to show that \x^\-^L. 
From eq (12), 



,z + l|2_ 



\xr[^Hy'tr] [i+{yH'f] 



[l + i^m. 



It is well known that the product on the right con- 

verges if the series ^^{y^vy does. From eq (19) the 


series S^' converges, and hence the first series does, by 

the boundedness of 7'. This completes the proof. 

V, The Method of Fixed a 

In this gradient method we choose ^{x) to be a 
constant a satisfying eq (8); thus, 



M 



0<^<2. 



(21) 



It follows that 



/(«)>T^' -^0- 
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From eq (1) we derive 

\i\<'^K\x\, 

where K is the bound of the matrix A. Hence t is 
bounded from above. Thus /(a) is bounded from 
below by a positive constant. Theorem 4.2 is now 
applicable. 

Theorem 5.1. Let 7' be chosen as a constant a 
satisfying eq (21). Then 

lim x' = Lyi, lim ju^^^i- 

i— >oo i—yao 

For later use we note that 



\x^ 



(22) 



which was established in the proof of theorem 4.1. 

We propose to show now that under a strengthen- 
ing of condition (21) the gradients ^' converge in 
direction to the second characteristic vector 2/2- 
The new condition is 



-w '<^<'' 



(23) 



Lemma 5.1. Under the condition (23) the sequence 
[x^} is infinite and 

a)>0, e^O, i=l,2, .. .,r;i-0,l,2, .... 

From eq (16) 

a;— {l + «(M^-^-X,)}ar^ (24) 

By eq (23), the expression in braces is positive. Hence, 
from a- >0, we have a; >0. From the second equa- 
tion (18) and the assumption r>2, it follows that 
g^F^O. The last inequality assures that [x'} is in- 
finite. 

We introduce the following notation 



r. (25) 



Thus, under the condition (23), 

Lemma 5.2 Under the condition (23), 

Qt+l 

lim -^=6^; 

i— >oo a^- 



lim 4=0, 

z— »oo a^ 

for j<Ck. 

The first equation follows from (24) and the fact 
that jLt'-^Xi. To prove the second equation notice 
that for i<fc. 



4+' 



a. 



4*andO<Vl. 



~Sj 



Hence, for any €>0 there is a constant K and index 
io such that 



i<^(t+')' 



i>io. 



Choose € so that the number in parentheses is less 
than 1. This completes the proof. 
Lemma 5.3 Under the condition (23), 

hm ' =(x,-XO, lim^-X2-Xi. 

i— +00 u>!^ /— +00 0/2 

From the orthogonality relation (x\ r) = and 

eq (16) and (17) we obtain 



(Xi — mOO'I +(^2— M>2 



+ {\r-n')ai'=0. 



Divide by ai and take the limit as i-^ cx> . The 
first conclusion of the lemma then follows from 
lemma 5.2. Using this result we establish the second 
conclusion in a similar way by means of eq (18). 

Theorem 5.2. I^et the constant a be chosen to 
satisfy (23). Then 



'^^m 



--y2, 



liniM(r) = X2. 



Divide both sides of eq (17) by ai. Using lemmas 
5.2 and 5.3 we obtain 



r" 



It follows that 




r 

m 


ai in 


as desired. Also, 





lim— =(X2-Xi)y2. 

i—>oo Uo 



y2, 



Mfe) = M(^T|TJ->M(2/2) = X2. 

This completes the proof. 

We remark that for the proofs of theorems 5.1 
and 5.2 the conditions (21) and (23) could have 
been relaxed by using X;.— Xi in place of M—Ap—Ai. 

We conclude this section with a reformulation of 
theorems 5.1 and 5.2 for convergence to the two 
highest characteristic vectors yr, yr-i- 

Theorem 5.3. Let the constant a satisfy eq (21). 
Define the sequence {x'} by the recursion formula 



v.i + 1 



= x' + a^\ 
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Then 

Urn x' = Lyr, lim iJL(x') = \r, i>0. 

If, further, a satisfies eq (23,) then 

■-—yr-i, hm M(r) = Xr-i. 



e 



l^lr 



This result is obtained by replacing A by B=—A 
in theorems 5.1 and 5.2, and noticing that the char- 
acteristic roots become 

-X.<-X,_i<. . . <-Xi 

with corresponding characteristic vectors yr, yr-ij 
' ' ' ,yi- 

VI. Rate of Convergence for Fixed a 

In this section we investigate the rate of conver- 
gence of the sequences {a:^}, (J'} and related sequences; 
in the rest of this section we assume eq (23) holds. 
Tor convenience we introduce the so-called '^ratio'^ 
of a sequence of real numbers as a measure of speed 
of convergence to 0, and develop some elementary 
properties of sequences with ratios. This notion 
will also be useful in our later generalization of the 
method of fixed a. 

Let us agree that a sequence {b^} of real numbers 
will be called positive (negative) if 6*>0«0) for 
^^hi h fixed. We shall understand that the se- 
quence is mono tonic in case it is mono tonic for i>io. 

Definition. A sequence {b^} of real numbers will 
be said to have the ratio k in case 



lim —rr=K- with /c>0. 



Necessarily a sequence with a ratio is either positive 
or negative. 

The reason for this definition lies in the next 
lemma, which is essentially a rephrasing of Lemma 
5.2. 

Lemma 6.1. The sequences {a}}, {a^/a^} (^*, fc= 1, 
- • • , r) have ratios dj, dk/dj, respectively. 

Sequences with ratios resemble geometric progres- 
sions. If {6'} has ratio k, then for an arbitrary 
€>>0, with e<^Ac, there are numbers Ti^O, T2y>0 
such that 

T,{K-ey<\b'\<T,{K-\-e)\ 

Thus, if /c<Cl, then &^ tends to zero more rapidly, 
eventually, than a geometric progression with ratio 
/c+€, and more slowly than the progression with 
ratio /c— €. Accordingly the ratio of a sequence is a 
measure of the speed of convergence to zero. We 
shall express our results on rate of convergence in 
these terms. Notice that when k<C^I the sequence {&*} 
is monotonic; decreasing if the sequence is positive, 
increasing if negative. 



Suppose {6'}, {c*} have ratios /ci, k2 respectively. 
Then clearly {Vc'] has ratio kiK2 and {¥lc'} has ratio 
kiIk2. a convergent sequence {d'} with a nonzero 
limit has ratio 1 . Consequently each of the sequences 
{¥d'], {byd'} has ratio ki. The following lemma will 
be used frequently. 

Lemma 6.2. Let [¥} have ratio k. Suppose {d'} 
is a sequence such that d'/b' has a nonzero limit. 
Then {d^} has ratio k. 

For, dyb' has ratio 1, and the product d'^{dyb') ¥ 
has ratio k. 

If K9^l, then the sequence of difterences b*~b*^^ also 
has ratio k. For this we apply the last lemma with 
di=b'-b'^\ 

Lemma 6.3. Suppose {b'} has ratio 5<1. Let 



Then 









In particular, {/?'} has ratio b. 

Clearly we may assume that [b'] is positive. Se- 
lect an arbitrary e with 0<e<5, 5 + e<l. Then for i 
sufficiently large, {b—e)¥<¥^'<{b+e)b\ {b-efb'K 
b'+'<(8 + eyb\ and in general 



{8-eyb'<b'+^<{8 + eyb' 
Hence, summing, 

1 



i-0,1,2, . . 



1- 



d + e-b\-l- 



1 



1 



B' B' 

5 + e^l™F=l™F^l-5-.- 



Since e is arbitrary, the desired conclusion follows. 

Lemma 6.4. Suppose {b'} is a positive sequence 
with ratio 5<1. Let 



Then 



j=i 



lim 



Q'-i, 



1 — 8 



The existence of the infinite product Q^ follows 

00 

from the fact that the series ^b^ oi positive terms 

j = i 

converges. Now 

u\l + b') = l + {b' + b'+'+ . . . +6^+0 

i 

+{b'b*+'+ ...)+••• +{b'¥+' . . . b*+'). 
When i is sufficiently large for 6^, j> i, to be positive, 
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$,b^<n\i + b')-i<B'+iBy+ . . . . 



Allowing l-^ 00 , we obtain 



B'<Q'-l< 



B' 



l-B' 



Dividing by b\ letting ^->oo^ and using lemma 6.3 
we obtain the desired result. 

We are now ready to state the main results of the 
section . 

Theorem 6.1. Let eq (23) hold. Then m'— Xi is a 
positive sequence with ratio dj. 

This result is an immediate consequence of lemmas 
5.3, 6.1, and 6.2. 

From this theorem it follows that /x'— ju^"^^ is also a 
sequence with ratio ^2. Likewise, V is a sequence 
with the same ratio, as we see from lemma 5.3. 

Lemma 6.5. The positive sequence L—a{ has 
ratio 5|. 

The number L is that of theorem 5.L We have 
a\-^L. From eq (16), 



a\ = a? n {H-a(M'-XO}, i=a? n {l+a(M'-XO}. 

k=0 k=0 



Thus 



L-a\ = a\^U^ {l + a(M^-XO}-l]- 

Since m^— Xi has ratio 81, the conclusion follows from 
lemma 6.4. 

Theorem 6.2. Let eq (23) hold. Then the sequence 
\x^—Lyi\ has ratio 82. 

From (16) 



x — Lyi = {a{ — L)yi + aiy2+ . . . +aiyr. (26) 



Thus 



\x-Ly,\'={a\-LY + ai'+ . . . +ai' 



Hence the left side is a sum of terms having ratios 
^2, ^2, 81, . . ., 8^. Since each of these numbers, other 
than the second, is strictly smaller than the second, 
the theorem is established. 

Concerning the convergence of \x'\ to L, we remark 
that L—\x^\ is a sequence of ratio 8l. The proof is 
like that of lemma 6.5, with ai replaced by \x'\^. 

Information on the convergence of each com- 
ponent of x^ can be derived from eq (26). We write 
x' = {b{, bi, • • ' , bi), yi = ici, C2, • • • , Cn)- Now for 
£xed j, b) — Lcj is a linear combination of a\ — L, a\, 
' ' ' , ai, these sequences having respective ratios 8l, 
^2, ^3, ' ' ' , ^r- If the j^^ component of 7/2, the 
coefficient of ai, is not 0, then b) — L Cj has ratio ^2; 
accordingly the difference sequence 6} — 6;^\ which 
is numerically available, has ratio 62. If the j^^ 
component of 2/2 is 0, then, in general, b) — Lcj will 



have a smaller ratio (assuming the sequence is not 
identically 0) . 

For results on the convergence of sequences asso- 
ciated with the second characteristic vector 7/2, we 
appeal to the forthcoming theorems of section 11. 
Tliese results may be spc^cialized to the sequence 
{^'} by recognizing that in the notation of that 
section, 

x'=z{, i'=z\, iJi\=fji(x'), fxi=ii(e)' 

Also, we agree to interpet 8r+i as 0. 

Concerning the rate of convergence of |^'| to we 
have, from lemma 5.3, that {|^'|} is a sequence wath 
ratio 82. 

Theorem 6.3. Let eq (23) hold and set -n'^^'Jl^'l 

(1) If 8iy>8^, then {X2— m(^0} is a positive, monotonic 
sequence with ratio 55, and d''?" — 2/2I} is a sequence 
with ratio 62. 

(2) If 8l<C8z, then {^(r) — ^2} is a positive, monotonic 
sequence with ratio (8^82)^, and (h' — 2/2I} is a se- 
quence with ratio 53/52. 

(3) If 5^=53, then /x(r)-X2 = 0(5^) and |r7^-7/2| =0(52). 

These statements are interpretations of theorems 
11.1, 11.2, and their corollaries for j=2. From 
theorem 11.4 we derive the inequality 

X2-M(rKM(^:0-Xi, i = 0, 1,2,..-, 

the equality holding just in case r = 2. From this 
and the above theorem we obtain the following in- 
teresting corollary. 

Corollary. If 52>53, then for sufficiently large i^ 

|X2-/i(r)l<|x,-M(^0i- 

More generally, when X2— ^(f) is nonnegative, 
and i is large, then /x(JO is closer to X2 than ju(x') is 
to Xi. On the other hand, for case (2) of theorem 
6.3, the sequence {m(^0 — X2} has ratio (53/52)^>52. 
But 52 is the ratio of {m' — X2}. Hence in this 
instance /x' approaches Xi more rapidly than ju(^') 
approaches X2. 

Notice that for r — 2, the inequality 52^53 is 
automatic. If r> 2, then, from theorem 11.3, the 
inequality 52>53 holds whenever 

X3 X2^X2 Xi . 

VII . The Method of Optimum a 

In sections 10 and 11 the method of fixed a wdll 
be extended to obtain sequences that converge to all 
of the characteristic vectors yi, y2, - - - , y^ Before 
proceeding to this generalization we wish to develop 
first the method of optimum a, so that a comparison 
of the two methods (and others) may be made at 
the most advantageous point. The optimum pro- 
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cedure does not appear to generalize to a larger 
number of characteristic vectors as simply as the 
method of fixed a. 

According to eq (10) the iteration scheme is now 
given by eq (12) with 



s^ + Vs^^+4f 



.>o. 



(27) 



The numbers V are bounded (see remarks preceding 
theorem 5.1); so are the numbers s\ Thus 
J(y^)=y^ is bounded from below by a positive con- 
stant. Accordingly, theorem 4.1 yields the follow- 
ing lemma. 

Lemma 7.1. Let 7* be given by eq (31). Then the 
conclusion of theorem 4.1 holds. 

In order to apply theorem 4.2 it is necessary to 
establish the boundedness of the y\ This is done in 
the next lemma. 

Lemma 7.2. Let 7' be given by eq (31). Then 

1 

lim 7^ g ; 



X2 Xi 

Since jit'— >Xi it is sufficient, by eq(ll), to show that 
lim fji{e)>^2' 

i — >w 

Let T denote the inferior limit on the left. Let 

Then T— lim m('70- Select a subsequence {t;*} such 
that r — lim /x(7;*). Select a further subsequence 
{77^} that converges to, say, 77. From {t]\x^) = 
and lemma 7.1 we obtain 



o=!i'^(''''p])=('''^'' 



Since rj belongs to j^^ (space spanned by yi, * • ',yr) 
it follows that m(^)^^2. Hence 

r-limM(V)-M(^)>X2, 

as desired. 

Theorem 4.2 now yields the principal result. 
Theorem 7.1. Let 7' be given by eq (27). Then 

lim x^ = Lyi, lim /x'^Xi. 

That the vectors JVI^I do not converge in the 
method of optimum a is a consequence of the next 
theorem. 

Theorem 7.2. Under the hypotheses of theorem 
7.1, 

Proof. We have, 






Hence 



(f, r +o==(r, Ax')-Y{Ae, r)+M^"^y(r, r) 
={i-7^"(M(r)-M^'-^^)}(f,r) 
=0 

byeq(ll). 

The last theorem indicates that the coefficients a} of 
eq (16) converge in an irregular fashion as i — > 00 . 
Hence a determination of the rate of convergence 
such as that given in section 5 for the method of fixed 
a is not to be expected here. 

We conclude the section with a modification of 
theorem 7.1 for convergence to the highest char- 
acteristic vector yr. 

Theorem 7.3. Let the sequence {x^} be defined 

by 

xi + ^ = X^ + y^^\ 

with 

2 



Then 



lim x^^Lyr, i>0. 



This result is obtained by applying theorem 7.1 
to the matrix B=— ^. 



VIII. Comparison of Methods 

Each step in the preceding methods of gradients 
consists of forming the subspace spanned by x and 
^, i. e., by a: and Ax, and choosing in that two-space ^ 
a next approximation x. In the method of optimum 
a:, the vector x is chosen so that ti{x) is a minimum 
in the subspace; in the method of fixed a the parti- 
cular linear combination x—a^ is chosen. Superfici- 
ally, then, it would seem that the first method 
should give more rapid convergence to Xi and yi 
than the second. But the minimizing procedure can 
only be recognized as an advantage when the vector 
x^ is common to the two methods. This certainly 
holds for the initial vector x^, but not necessarily 
beyond this stage; hence the relative merits of the 
two methods are not evident. It seems reasonable 
that for low order matrices the procedure of finding 
an optimum a might be advantageous, while for 
matrices of higher order the fixed a procedure might 
be superior. In practice a combination of the two 
methods would be in order. 

The method of fixed a has the advantage of com- 
putational simplicity over the alternative method. 
On the other hand the former requires some advance 



6 We shall remark later on the possibility of dealing with subspaces of dimension 
higher than two. Cf. a forthcoming paper by W. Karush, An iterative method 
for finding characteristic vectors of a symmetric matrix. There the method of 
optimum a is generalized to subspace of arbitrary (fixed) dimension. 
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inforiiiation on the cliaracteristic roots in order to 
estimate an allowable a; computation by the method 
of optimum a demands no such knowledge. 

The method of fixed a: is a smooth method. The 
vectors JVI^I converge; by a relatively simple ex- 
tension of the procedure we can obtain convergence 
to all the cliaracteristic vectors of .c^ (see section 10). 
Furthermore the convergence is geometric in nature. 
The method of optimum a has none of these advan- 
tages. The successive gradients are orthogonal, and 
the coefficients a'2, a^, . . . of eq (16) do not tend to 
smoothly. 

Sequences {¥} with ratio 5<^1 provide examples of 
'linear" convergence. If the ratio h^^^l{h^y has a 
finite limit, then the sequence converges in a ^'quad- 
ratic'^ fashion. In the method of fixed a we have 
convergence of a linear t3^pe. It is possible to 
procure quadratic convergence by modifying the 
method, but the price paid would be the solution of a 
system of linear equations at each step of the 
iteration. 

A combination of the two gradient methods can 
be used to advantage when the characteristic roots 
Xi, X2 are close and relatively isolated fi'om (he other 
roots. Suppose we begin with the fixed a iteration 
and continue up to a certain stage. The last esti- 
mate X and its gradient g may not be good individual 
approximations to 7/1 and 7/2 but the pair may provide 
an excellent approximation to the plane of ?/i and ?/2. 
One application of the optimum a procedure at this 
stage should then yield a good approximation x to 7/1. 
The iteration may be started again with x. This 
technique has worked well in several numerical 
examples. 

Since /==|j|/|x|->0, it is clear that fewer and fewer 
significant figures will be retained in J as the iteration 
proceeds in a numerical calculation. Nevertheless, 
\ and /z(J) supply useful approximations to 2/2 and 
X2. For example, after iji and Xi have been obtained, 
the approximation ^ can be used as the initial vector 
x^ in the calculation of ?/2 and X2 b}^ a new use of the 
method of gradients. 

The above remarks apply equally well to the 
highest vector yr. One of the advantages of the 
gradient methods is that they can be applied directly 
to either end of the scale of characteristic roots. 

A single step (2) of the gradient procedure 
carries one from 



^ = CL\yi + (l2y2 + ' • ' + CLryr 



to 



x = {l-\-a{^i — \i)}aiyi+{l + a{ix — \i)]a2y2+' • •• 

Thus if an}^ characteristic number X^ is known, then 
orthogonality of x to yj can be achieved by choosing^ 



\j — iX 



' The same result can be obtained by taking J=.4j— X/j; the advantage of the 
above procedure is that it fits into the iteration scheme (12). 



This observation may be used to maintain (approxi- 
mate) orthogonality which may be theoretically 
assured but may be gradually lost in computation 
due to round-off errors. For example, in })egiiining 
with J as an initial vector for obtaining ?/2 (see 
second paragraph above), one must maintain ortho- 
gonality to ?/i during the iteration procc^ss. To do 
this, the procedure (12) may be hiterspersed with 
steps in which y^={\i — fj.'^)"^. Similarly, one may 
maintain orthogonality to any number of charac- 
teristic vectors, or induce orthogonality to such 
vectors, provided only that one knows the cor- 
responding characteristic roots (relatively accur- 
ately). The desired ends are achieved within the 
framework of the scheme (12) by particular choices 
of 7\ Notice that if r—\ of the r characteristic 
roots \j are known then the characteristic vector 
belonging to the remaining root can be obtained in 
T—\ steps. 

It was remarked after theorem 6.1 that AjLt^ = 
ix^^^ — ix^ and V are sequences with ratio bl. Hence 
observation of the ratios A/x'+7^m' or f +V^^ ^^^ l^^d 
to an estimate of 52. Since 52= 1 — a(X2— Xi), and a 
is known, another estimate of X2 may be made, 
alternative to the estimate tx{Q. FurtluMinore, if a 
sequence c^ has a limit (^ , and the difFerence &—C 
is a sequence with a ratio 5<1, then a transforma- 
tion for speeding the convergence of c^ is 



c* 


f*4- 


Ac* 


— t nr 


1-5'' 




c'+' 


-6V 




1 


-6' 




c'+' 


-5c' 



(-1:^-) 



1-5 

With an estimate of 52, this formula may be used 
to improve /x^ and the components of x^ (see remarks 
following theorem 6.2). 

A certain degree of flexibility may be introduced 
in the method of fixed a by replacing the constant 
a by a variable a\ 

' ^' (28) 



'M 



which satisfies 



(i^i, i82 constants). (29) 



The conclusion of theorem 5.1 remains valid, since 
this choice of a^ does not violate the assumptions of 
theorem 4.2. 

The convergence of the gradients as in theorem 5.2 
may also be justified provided one imposes the addi- 
tional restrictions 



32<1 and^<- 

Pi A2 — Ai 



(30) 



Under these conditions statements on rate of con- 
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vergence, appropriately formulated, can be estab- 
lished. 

For comparison purposes we state here, without 
proof, some properties of the ^'power^' method. 
The iteration formula is 

Suppose now that there is one characteristic root 
of maximum absolute value; denote it by X^;. Then 



lim 



'Vk, 



lim ii{x')-^\k. 



(In the first limit the sequence with odd indices 
tends to —y^ if Xa;<0, and to yk if Xfc>0.) It was 
pointed out in the introduction that the power 
method is a gradient method with 

In this form the iteration formula is 

i.e., z'+'^ifji^' • . . fiT'x'-^\ 
and the convergence result is 

lim z^=Kyk, K==const. 

(provided m* is never 0). Thus jit*, as well as |a:'|, 
may be used as a normalization factor. 

The power method leads directly to an estimate of 
\k, and hence to an upper bound 2\\k\ of M. Thus 
a rough application of this method may be used as 
a preliminary step in the method of fixed a in order 
to determine an allowable value of the constant a. 

Suppose now that X^ is a characteristic root of 
next highest absolute value. Concerning rate of 
convergence we remark that m'—^/c is a sequence of 
ratio (Xz/Xfc)l If— X^ is not a characteristic root, 
then f^'= J(x^') tends in direction to 



(X,-X,) 

!x,-x,| 



yi', 



the sequence with odd indices tends to the negative 
of this vector if Xz<0 otherwise to the same vector. 
We conclude this section with a remark on the 
method of relaxations.^ Any method which begins 
with a vector x and applies a sequence of transforma- 
tions to it so that the resulting sequence of vectors 
x^ converges to a characteristic vector, must induce 
the gradients ^(x') to tend to the null vector. Thus 
any artful modification of a vector x to a new vector 
X which brings the gradient closer to the null vector 



s Cooper, J. L. B., The Solution of Natural Frequency Equations by Relaxa- 
tion Methods, Qtrly. Appl. Math., vol. 6, p. 179 (1948). 



would be a plausible procedure in a numerical cal- 
culation. The full skill and intuition of the computer 
may come into play in varying the vector at any 
stage to produce a better approximation. One 
systematic procedure is to modify a single component 
of X in such a way as to make a single component of 
the gradient vanish. Other devices may be used; 
the best trick at any point depends upon the infor- 
mation available at that stage. This is the flexible 
approach of the method of relaxations. Clearly it is 
not easily adaptable to automatic computation. 

IX. Invariant Subspace ^ 

Before turning to the generalization of the method 
of fixed a we shall develop here some properties of 
the invariant subspace s/r- We will encounter cer- 
tain polynomials Pj{X) which have been considered 
by Lanczos,^ and some of the results of this section 
will overlap his work. 

In place of the notation x^ we shall use Zi to de- 
note a given vector with expansion 



^i = aiyi+a2y2+. . . + aryr 



(31) 



as in eq (4). The space s/r = {yi - • - ,yr) may be char- 
acterized as in the following lemma. 

Lemma 9.1. The vectors Zi, Azi, • • • , A^~^2i 
span the space J3^. 

Let ^be the space spanned by the vectors A^Zi, 
Ar = 0,l,2,---. Then this space is the smallest in- 
variant subspace which contains the vector Zi. Since 
2 1 is in s^r, and j^ is invariant, it follows that the 
relation ^<is^r holds. 

Suppose that .^were a proper subset of the invar- 
iant space s^r- Then ,^ would be spanned by a 
proper subset of the characteristic vectors ?/i, ^2, 
' • ' , yr of j^r- Consequently the vector Zi, being 
in ^, would be orthogonal to at least one of these 
characteristic vectors, say y^. Thus, from eq (31), 
(^k=(^i,yk) = (^- This contradicts the assumption 
a^y-O (see eq 4). 

We now have ^^=s/r- Consequently, dim ^=diin 
j^^=:r. The lemma now follows from the fact that 
dim .#is the least integer h such that the vectors Zi, 
Azu . . . , A^-'^Zi span M 

We now define the vectors Zj recursively as fol- 
lows. Let 



(32) 




Then 



Z2= Azi — jjLiZi 

Z 3= AZ 2~ IJi2^ 2 — ^2^1 



Zj+i = Azj~}jLjZj—t^Zj^i, j=^2, 3, 



r. 



y (33) 



» "An Iteration Method for the Solution of the Eigenvalue Problem of Linear 
Differential and Integral Operators," J. Research NBS 45, 255 (1950) RP 2133 
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The vectors 2i, ^2, . 



. , 2r, ^r-^i are well-defined and 
j=l,2, . . . ,r. 



For suppose, Zi, . . . , z^ are defined and none is null, 
k<^r. Then by eq (33) Z]c+^ is defined and is a linear 
combination of A^Zi, A^~^Zi, . . . , Azi, Zi with the 
coefficient of A^Zi equal to one. Further, if fc<r — 1, 
then Zfc-\-i9^0 by lemma 9.1. This establishes the 
assertion. 

By induction it is easy to prove that 



{Zj,Zj)=(Zj,A2j.l), 

{zj,Zk) = 0, 



i=2,...,r+l, 

J9^k, j,k = l,2,. ..,r4-l. 

(34) 



Lemma 9.2. Let s/, be the space spanned by Zi, 

Azi, 



-1 



. .,A^-' zu I.e., 
s^ = (zuAzu. . -^A^-'z:) 



j=l,. . .,r. 
Then {zi, . . ., Zj) is an orthogonal basis for s/j. Also, 

By lemma (9.1), /./j has dimension j. From the 
above, the nonnull vectors (zi, . . ., Zj) are orthogonal, 
and lie in ,c^; hence they span j./j. 

The final conclusion of the lemma follows from 
the fact that Zr+i is orthogonal to Zi,. . .,Zr and 
hence to J3^. But from the invariance of ,c^i^, Zr-^i 
belongs to s^- Thus Zr+i = 0. 

Given an arbitrary vector Zi the integer r of eq 
(31) can be determined from the above lemma; r is 
that integer such that Zr+i is the first Zj that van- 
ishes. Notice that except for j=r, si/j is not the 
space spanned by ]/i, . . ., ijj. 

We consider now the expansions of the ^^ in terms 
of the yj. From eq (31) and (33), 

22 = (Xi— Ml) «1^7l+(X2 — Ml) Cf.2?/2+ • • • +(X, — Ml)0^r]/r. 

Thus, if we put 2^i(X) = X — mi wc have 

22=^1 (Xi)ai?/i+2?i(X2)a27/2+- • •+Pi(X,)a,7/,. 
Again from eq (33), 

^3={Pl(Xl)(Xi — M2)— ^I}g^1^Ji+- • • 
+ {^1 (Xr) (Xr — M2) — ^2}«^'rl/r. 

Hence, if we put ^2(X)=;pi(X) (X — M2) — 12 we have 
23=P2(Xi)aiyi+2?2(X2)a2^2+- • •+p2(X.)a,7/,. 
In general, if we define the polynomials p^ (X) by 

2>o(x)=i,:Pi(X)=x-mi,:P;(x)=:P;_i(x)(x-m;) 

-^?:Pi-.2(X),i=2, ...,r (35) 



we have the expansions 

^y+l = i^; (Xl) ait/i-l-2?,-(X2) ^2^/2 + 

i=^o,i,. 



r. 



(36) 



From eq (36) with j = r-\-l, and lemma 9.2 we see 
that the roots of 

^r(X) = 

are the characteristic numbers Xi, X2, • • • , X^. This 
is a special case of the forthcoming theorem 9.2. 

In lemma 9.2 we defined the subspaces ,c^ of j^. 
In order to develop some properties of these sub- 
spaces we state without proof several well known 
properties of symmetric linear operators. 

Let {^ denote a linear space of real vectors. Let 
B denote a symmetric linear operator on P^. Con- 
sider an /-dimensional subspace /^' of ,^; let tt be 
the projection operator into Z;^'. The operator ttB 
with domain restricted to ,!^' is a symmetric opera- 
tor on /^^ ; by the characteristic roots and vectors of 
B relative to f^' we mean the corresponding quan- 
tities of this operator. An alternative characteriza- 
tion is given by the property that ^ is a character- 
istic vector (relative to ;^') belonging to the char- 
acteristic root K if and only if v belongs to :J^' ^ Vt^O, 
and 

{x,Bv) = K(Xy v), X in .^\ 

Also, the smallest and greatest roots relative to ^' 
are respectively the minimum and maximum of n{x) 
on .^'. If i^' is an invariant subspace of /^ then 
tB=B on .^' and the characteristic roots and 
vectors relative to .^' are such in the whole space 
:'^; this is the case with .^■^ as a subspace of .^. 

If Ui,U2,' ' ',Ui is an orthonormal basis for ^' 
then the operator tB relative to this basis has the 
IXl matrix representation 

B' = {Auj,u,,), j,k=l,2,- • •, /. 

The characteristic roots relative to ,^' are the roots 
of 

!X/-B'HO. 

We quote a well-known result. (The first conclusion 
of the theorem below holds without the restriction 
that the kj be distinct.) 

Theorem 9.1. Let i^ be a symmetric linear opera- 
tor on an m-dimensional real linear space .#; let ^' 
be an /-dimensional subspace of .^. Suppose that B 
has distinct characteristic values ki<^k2<C- ' ' <^Km 
with corresponding characteristic vectors ^1,^2?* • ', 
Vm- Let the characteristic roots and vectors of B 
relative to ^^ be respectively k[<k2<- • '<k'i and 
v',V2r ' ' , ^i' Then 

The equality holds for all indices if and only if 



55 



Vi=l\, V2 = V2 



, Vi = Vi] 



in this case ^^ is an invariant subspace.^^ 

We return to the subspaces s^j. The character- 
istic roots and vectors of A relative to .c]^ are those 
of the symmetric operator Bj=TrjA on s/j, where ttj 
is the projection on s/j- Since Zi, . . . , Zj is an 
orthogonal basis of s/j the operator Bj has the matrix 
representation 



/ (AZi,Zm) \^ 

\ \^l\\^m\ / 



/,m=l, 2, 



J- 



We calculate this matrix. 
Lemma 9.3. For a given j, 



(Azj,z,)=\zj\' 


when k=j—] 


= ^j\Zj\^ 


when k=j 


= k.-+.P 


when k=j-\-l 


= 


otherwise. 



We remark that for 7= 1, the first equation is to be 
omitted. The first and third equations follow from 
the first equation of (34). The second follows 
from the definition of nj. The last is a consequence 
of the orthogonality relations of (34) and the fact 
that Azj is a linear combination of Zj+i, Zj and %_i bv 
(33). 

It follows from this lemma that Bj has the matrix 
representation 



Bj- 



Ml ^2 

(2 /i2 h . 

^3 /X3 








. . 

. . tj 
tj iij 



Lemma 9. 4. W Let go(>^)=l and gy(X) be the char- 
acteristic polynomial of Bj, 



qj{\) = \\I-Bj\, i=l,2, ...,r. 



Then 



PjW = ^j(^), i=o,i, 



r. 



Using the matrix representation above it is easy 
to see by direct calculation that 

gi(X) = X — Ml, g,(X) = (X — My)g,--i(X)-^?g,_2(X). 



10 This result follows from the minimax principle for characteristic values. 
Cf. Courant and Hilbert "Methoden der Mathematischen Physik," 1, 2d ed , 
Berlin (1931), pp. 27-29. 



The lemma now follows from the fact that these re- 
cursion relations are identical with eq (35). 
Theorem 9.2. The roots i'i<?^2< . . . < ^^ of 

are the characteristic roots of A relative to the sub- 
space s^. Furthermore, 



Xi< J^L, X2<I'2, 



, ^J<^Ji 



the equality holding for all indices if and only if 
j=r. 

This theorem follows from theorem 9.1 with 
^=j3^^ B=A. The last statement follows from 
the fact that .q/j, j<Cr, is not invariant. 

X. Extension of Method of Fixed a 

We return now to the sequence [x^} of eq (12) with 
'y*=a=const. As in theorem 5.2 we impose the 
condition (23)^^ on a. We rename x^ as z\ and apply 
the results of the previous section to each z\. The 
expansion (31) becomes (16), 

^\ = ci\yi+ciiy2+ . . . +(ilyr- 

By lemma 5.1 each a} is positive; accordingly, the 
space s/r associated with 21 is independent of i and 
coincides with our original space js/r- We recognize 
4 as our originial ^\ 

The polynomials of eq (35) now depend upon the 
index i; we denote them by p^-{\). For each i they 
are defined in terms of 






i=l,2, . . . ,r, 
j=2, . . . ,r. 



It is convenient to introduce 

^o(X)=l, 

^XX) = (X-X,)(X-X,_i)- • -(X-Xi), 
i=l,2,..., r. 

The vectors (si, zi, • • •, zi) comprise a sequence of 
orthogonal bases for the space .^/r. We show that, 
when normalized, this sequence converges to the 
fixed basis yi, • • •, yr- 

Theorem 10.1. Let the constant a satisfy (23) 
and let the initial vector Zi = x^ be given by (4). 
Determine the infinite sequences {^i}, {zi},- • •, {zl} 
by (33) and 

2{+^ = z{ — azi. 

11 In this section and the next we assume without further comment that this 
condition holds. 
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Then 



l-^co \2]\ 



--Vi 



lim ^x{z)) = \j, j=l,2,' ' ',r. 



The proof will be made by induction. For an 
integer k, 2<k<r, consider the statement 

(37) 

By theorem 5.1, (5.2), lemma 5.3 and theorem 5.2 
we see that the statement is true for k = 2. Assuming 
the statement true for k<ir, we shall show that it 
holds for t+1. This will provide a proof of the 
theorem. 

From the second eq (87), the definition of f-, and 
lemma 5.2 it follows that 

^;.->0, i= 2, •..,/:. 

Also by the first eq (37) 

My->Xj; i=l, 2, • • • , /:. 

Consequently, from eq (35), p{(\)^\ — \i^piO^), 

pi-^O^ — X2) Pi (X) =P2 (X), and generally, 

pi(X)-^j(X), j=l,2,. ..,/:. 

The vector under examination is 

^i+i=Pk0^i)(i\yi + fk(>^2)a'2y2 + ' ' '+pi(\r)aiyr- (38) 

From the orthogonality relation (z), zi_^^) = and eq 
(36), 

2 Poi^j)Pk(^j)Cif = 



z;pi-i(x;ji>i(X;)a;- =0. 



Dividing by aj.+i and writing 



pi(^j)(if 



i=l, 2, ...,/:, 



these equations take the form 

pi (X 1) «; + po (X2) 4 -r • • • + Po (X;t) 4 = hi 



^i_i(Xi)ai + :pi._i(X2)4+- • •-!K-i(X,)ai=6i 
By lemma 5.2, 

h)-^Pj{\,.+i)pn{\K+i). 



As i-^oo the matrix of coefficients {p](\i)) tends to 
(PjO^i)). The latter matrix has only zeros below the 
main diagonal; its determinant is the product of the 
diagonal elements, namely, 

^o(Xi)^l(X2) . . . Pk-.li^k)- 

From the definition of the polynomials pj and the 
fact that the X; are distinct, it follows that this quan- 
tity does not vanish. Consequently, for i sufficiently 
large, the above linear equations in a} may be solved 
for these unknowns. Furthermore, as i tends to 
infinity, the solutions have finite limits, say Lj. 
Hence 

a}-^X„ i=l,2, . . . ,k. (39) 

It is simple to compute Lk] allowing ^-^ 00 ^ we obtain, 
from the last equation, 

J _ Pk0^k-^i)Pk-iO<k+i ) 

Pk-Ak) 

For our purposes it is not essential to know more of 
the limiting values Lj than the fact that they exist. 
As a matter of interest they are evaluated at the end 
of the section. 

Now divide both sides of eq (38) by a[+i. From 
lemma 5.2 and eq (39), 



^k + i ,-r 



aL 



PkO<k+i)yk+i- 



From this we immediately obtain the eq (37) for 
j=k + l. This completes the proof. 

For future use we record some of the results of 
the above proof in the following corollary. 

Corollary. For j=:0, 1, • • • , r, 



^;(x)^F;(x), ^->i>,-i(x,). 



Also, for j=ly 2, • • • , /— 1, /=2,3, • • • , r, 

^i_i(X,)af 
af 

has a finite limit; this limit has the value 

Pl-l{\l) Pl-20^l) 

when 



Pl-2{^l-l) 
j = l-l. 

We return to the evaluation of Lj of eq (39). Let 



Then the /3j satisfy the Umiting equations 



(40) 
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iSi+ ^2+...+ /5.= -l 

In the second equation, ^i(X) is a polynomial of 
degree 1 ; by adding Xi times the first equation to 
the second we may reduce the second to 

Xi/3i + X2/32+ . . . +XA=-X,+i. 

In the third equation, p^{\) is a second-degree poly- 
nomial ; a linear combination of the preceding two 
rows will reduce this equation to 

}^Wi + \W2+ . . . +X^^.= -X^+i. 

Continuing in this way, we arrive at the equivalent 
system, 

^1+ ^1+ . . . + ^,= -1 
Xii(3i+ X2/32+ . . . + XA--X,;+i 



xr^^i+xr^/32+ . . . +xr'^.=-x^;;. 

The solutions ^j are immediately expressible in 
terms of the function 

V,{b,,b2, • • • , b,)^ n (bm-bi). 

Km 
I, m = l,2,-", k 

From the identity 



V,(b,,b„'-^ ,b,) = 



1 1 
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1 
b. 



it follows that 



b'r' &r' • • • 

yk\^2j ' ' ' J Xfc, X;fc4-i) 
Vjc{\i, X2, • • • , Xa;) 



6r' 



o _/ i\fe-l ^A:(Xl,X3, ' • • , Xfe, Xa;-m) 



^* 



^A:(Xi, * • • , X*_i, Xa;.li) 



F,(Xi,X2, •.•,x,) 
The Lj may now be computed from eq (40). 



XL Rate of Convergence 

We shall establish some results on rate of conver- 
gence for the sequences of the previous section. 
The terminology is that of section 6. Recall the 
definition of 5;, ^'=1,2, . . . ,r, ineq(25). It is con- 
venient to set 

We remark that for the lengths |sy| we have at 
once, from the corollary to theorem 10.1 and lemma 
6.2, that (1^/1) is a sequence with ratio 8j. 

Theorem 11.1. For a fixed integer j,j=2, 3, . . . ,r, 
consider the segu^nce {ju(2;)}. (l)If 5y>5j_i-5^+i, then 
{\j— fx(z',)} is a positive, mono tonic sequence with 
ratio (5,/5,_0l (2) If^ 5K5;-i-5^+i, then [^i(z'^-\j} 
is a positive, monotonic sequence with ratio (5;+i/<5;)^. 

We first assume j<ir. Then from eq (36) 



^jl{z)) — \j = 



{zi,Az))-\j\z)\ 



=T^{(Xi-x,)[p;--i(xoai]2+. . . 









■2 
a 



L__L _LAt ^' 

,2 I • • •\^^i-l .2 

a) -I 



where, by the corollary to theorem 10.1, the b\ 
have finite limits. 



Now suppose 6y> 5; _i6j^_i. We write 



(M(.5)-X,)^=^|6i^- 



a) 



a\ 



i^^/-2— 2^ 



+ b).. + bU.{^'^^^ (41) 

The factors of b\ (except for 6;_i) may be recog- 
nized as sequences of ratio less than 1, by lemma 
6.1; hence these sequences tend to 0. Hence, by 
the above-mentioned corollary, the left side of eq 
(41) has the limit 

-(x,-x,_o|^<o. 

Vi-2\f^]-\) 

Using lemma 6.2 we deduce the first conclusion of 
the theorem. 

Suppose next that 6^<5y_i5j+i. Then we write 
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(m(2*)-X,) 



+ • 



AH terms tend to except 6;+i. Hence the left 
side has the limit 



0^i-\-\ — ^})-=E: 



P^-i(h+i) 



^>o. 



From this we deduce the second conchision of the 
theorem. 

There remains the case j=r. In this instance 
only case (1) of the theorem is possible. The proof 
is identical with that given above for that case, 
except that no terms 6;+i and higher appear. 

Theorem 11.1 leaves open the possibility of the 
equahty 5f=6y_i5^+i. The likelihood of this condi- 
tion holding in a numerical instance is very slight, 
but the problem is subtler than the instances of in- 
equality and thereby has theoretical interest. We 
establish the following corollary. 

Corollary. Suppose 6- = 5j_i5j+i. Then 

To prove the corollary we require the following 
lemma. 

Lemma 11.1. If 6y = 6;_i 6^4.1 then the sequence 






has a finite positive limit. 

We remark that although the sequence in question 
has ratio 1 it does not follow from this that it has a 
limit. 

Byeq(16) 



(42) 

where fx^=fx(zi). The general term of the product 
may be written 



Letting 



^*=M*-X„ K,^^, K,=^, K,=^ 



the general term becomes 

where 

b^=fi\K,+K^-2K^)+\\\gho.Y terms in fi\ 



' = 1 + 6' 



The coefficient of ^^ is positive. For, 

-f(p-l)'>0. 

It follows that for sufficiently large k, h^ is positive. 
Furthermore, by theorem 6.1 and lemma 6.2, 6^^ is a 
sequence with ratio 62. Hence Xj^^ converges; 

k 

thus the product 11(1-1-6''), and hence eq (42), con- 
verges to a positive limit as i-^ 00 . 

We turn to the proof of the corollary. As in the 
proof of the theorem we form eq (41). We see that 
all terms tend to except the terms in i;_i and h)^i. 
The first of these has a finite limit, and, by lemma 
11.1, the second does, too. This completes the 
proof. 

Notice that for the corollary we cannot deduce 
that ix{z)) — \j is a sequence with a ratio; this is be- 
cause we do not know that the right side of eq (41) 
has a nonzero limit. In fact, it seems likely that 
this limit is zero. 

Theorem 11.2. Let j be a fixed integer, j=2, 3, 
• • • , r. Set u)=Y^r 

(1) If 5y>5;_i5;.ui, then \yf'i — yj\ has ratio 8jj8j^i. 

(2) If 5f<6;_i5;4.i, then \u) — yi\ has ratio 5^+i/5;. 
From the corollary to theorem 10.1 we see that 



z)\ 



1 



tends to 0. We wish an estimate of the rate of 
convergence. 
By eq (36), 

. . .-f-b}-i (X;-i) a}_i/a;]2+b;_i(X,+i) aj+iM}]^-^. • •}. 

Denote by c^ the first term on the left. Under case 
(1) of the theorem we multiply both sides by 
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d) = {a)_,lai)\ 

As in the proof of theorem 11.1 we find that 

(c' —1)^} has a nonzero limit. Noting that 

l-(l/cO = (c''-l)/c^(l+cO and using c^'->l, we find 
that 



('-^> 



has a nonzero hmit. 



(43) 



On the other hand, under case (2) we -find, by a 
similar argument that 



(i-^)^;+i 



(44) 



has a nonzero limit. 
We have 



Wi-yiY= 



1 



\^)? 



b:-_i(Xi)ai]2+ 



+ [^j_i(X,_Oa;._i]^+([,-l)^ 



Consider condition (1) of the theorem. Multiply 
both sides of the above equation by c?;. From eq 
(43) and the corollary to theorem 10.1 we obtain that 



W-y^d^ 



(45) 



has a nonzero limit. 

This proves the first part of the theorem. Consider 
next condition (2). This time we multiply both 
sides of the above equation by (ij+i. Using eq (44) 
we find 



H-yM) 



+1 



has a nonzero limit. This completes the proof of 
the theorem. 

Corollary. Suppose 5?=6;_i5j+i. Then 

Using lemma 11.1 we show as in the proof of the 
above theorem that the left side of eq (43) has a finite 
limit. Proceeding further as in the theorem we 
show that the left side of eq (45) also has a finite 
limit. This completes the argument. 

The following result is of interest in connection 
with the preceding theorems. 

Theorem 11.3. Suppose r>3. Let j be an in- 
teger with 2<j<r—l. If 



then 






Let g[j={\j—\i)/M. From the definition of the 5^-, 

5?-5,_i6,+i=(i-^^/-(l-^g,_0(l-^g,-+i) 



where 



D=(qj+i-qj)-{qj-qj-i) 



We find that 






Since 0<iS<l, and 0<gy_i<l, it follows that the 
coefficient of D is positive. Hence the left side is 
positive if D is. This concludes the proof. 

Notice that if j=r, the conclusion of the theorem 
holds, for, 8r+i=0. Also, if r=2, then j=2 is the 
only meaningful value of j and the conclusion like- 
wise holds. 

Theorem 9.2 enables us to prove an interesting 
result for the errors \j—n). 

Theorem 11.4. Let 



ei = \j—fx), j=l,2, . . . 


,r; -1 = 0,1,2, 


lien for each i, 




e{ + ei+'. . . +e)<0, 


i=l,2, . . - , 


e{+ei+ . . . +4 = 0. 





1, 



Consider a fixed j, j=l, 2, . . . , r, and a fixed i. 
As in theorem 9.2 let ^i, . . . , vj denote the roots of 
p}(\) = 0. From the definition of the jth degree 
polynomial p^{\) we can verify directly that the 
coefficient of X^~^ is — (mi+ • • • +M/)- Hence 

j j 

k=i k=i 

From theorem 9.2, 

k = l k k k k k 

the equality holding just in case j=r. 

XII. Concluding Remarks 

Associated with a vector Zi is the chain of sub- 
spaces 

j3^Cj^C • • • CM 

of section 9, together with the corresponding poly- 
nomials Pj(\), whose zeros give the characteristic 
roots of A relative to s/j. Let vi^j denote a mini- 
mum solution of P;(X) = 0. It is a consequence of 
theorem 9.1 that 
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,>^1 2> 



>^, 



Xi. 



Similarly, if V2, j denotes the next smallest solution, 
then 

^2. 2^ ^2, 3^ • * • ^ ^2, r = ^2' 

Analogous results hold for the higher characteristic 
roots. Thus, the characteristic roots of A can be 
found by successively finding the zeros of P2W, 
P's(\), etc.^^ The procedure may be followed, for 
example, after a good approximation zi has been 
obtained by, say, one of the gradient schemes. 

The subspaces and their polynomials may be used 
in other ways in conjunction with the method of 
gradients. It was pointed out earlier that in the 
gradient methods we pass in each step from Zi 
to a next approximation ii which lies in the subspace 
s/2 associated with Zi. One might consider choosing 
Ji in sy^ or even a higluu-dimeixsional subspace. 
The extra labor involved might be justified, or even 
essential, in an ill-conditioned problem in which 
several roots are clustered about Xi (or Xr). 

To illustrate, suppose that X,, X2, X3 are close to- 
gether and reasonably isolated from the other roots. 
After a certain numlxM* of iterations it would be ex- 
pected that the subspace /.A spanned by ^i, 22, Zz 
would contain only the characteristic vectors ?/i, 7/2, 
7/3 to a good approximation. At this stage, although 
the 2's might not be good indivichial approximations 
to the t/'s, the characteristic vectors 7/1, U2, u^ relative 
to s^z would provide such approximations. Further 
accuracy for 7/1 would then be obtained by continuing 
the iteration process with u^. 

We proceed to derive the formulas for the char- 
acteristic roots ^i<i^2:<^3 and characteristic vectors 
7/1, U2, Uz relative to sA- The roots vj, are the solu- 
tions of 

2^3(X) = (X-Ml)(X-/X2)(X-M3)-^i(X-/X3)-/|(X-/Xi)=(). 

This cubic is most easily solved by introducing the 
new variable p and the constants ju, a^ as follows: 

M = — (mi + M2 + /^3), (Tjc=ixic—li, A:=l,2,3, 

X = /r+P- 

The equation to be solved is now 

p^ + 6p+c-0, 
where 

b=^ (T1O2 -\- (T2 O'S'^ O'lO's — ^2 — Uj C = ^20'34~^3 0"! — <^1 ^ 1 ^ Z- 

The solutions may now be determined rapidly by, say, 
Newton's method. To find Ui, the characteristic 
vector relative to sA belonging to V], we write 

ni = Zi-\-aZ2-\-^Zz 



12 This is one of the methods developed by Lanczos. 



and attempt to find a and /3. To do this observe 
first that since Ui^ is a characteristic vector relative 
to /Ai belonging to v^, 

{:ii, An^-v^ui) = ^, n\\^sA, ^=1,2,3. (46) 

Using eq (46) with /:= 1, and u=Zi, u=z^ in turn we 
find, by eq (34), 

= a\zz\' + PUz\^z\'-^Pi\Zs\'. 



Thus 



^1 — Ml 



^1 — M3 



SO that 



Ui=Z, 



v\- 



'. \ ^1 — M3 / 



The formidas for U2 and u^ are obtained by replacmg 
vi by V2 and 1^3, respectively. 

vSuppos(^ tlu^ lower chai'act(^ristic roots Xi, X2, . . . 
and the higluM- characteristic roots X„ X, 1, . . . have 
been accurately calcuhxted. The intermediate roots 
and vectors can then be calculated by tln^ gradient 
procedure^ (12) as follows. W(» i^pplj the proccnhire 
to an initial vector j", inducing (if necessary) and 
maintaining orthogonality to the vectors ?/i, 2/2, • • • 
and yr, yr-u ' - • by more or less frequent selection 
of 7* as l/CXj — fx^), with X_^ ranging over the known 
roots (see section 8). Use of the sitbspaces jry^, ,cj^, 
. . . may come into play as describ(Ml above. Notice 
that as more roots are found the constant a in the 
method of ilxcd a may be chosen larger. 

We I'emark that indc^pendc^nt characteristic vectors 
belonging to a multiple characteristic root can be 
determined by varying the initial vector x^. 

We conclude by pointing out that the results of 
theorem 10.1 remain valid if the constant a satis- 
fying eq (23) is replaced by the variable value of 
eq (28) where, in addition to eq (29) and (30), the 
condition 

P2 ^ ' f^j — Xi 

T< ^'"^ \ — T 

Pi ; = 3,4,. . .,r Ay_i— Ai 

is imposed. The purpose of this restriction is to 
guarantee that a^dy-^O for j<ik. The results on 
rate of convergence in section 11 may be modified 
to fit the new conditions. 

Los Angeles, May 4, 1950. 
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